Introduction

What is this

This notebook will include informal meta-analyses of different metrics and methods for evaluating surgical skill.

The reported metrics compare differences between novices and expert surgeons.

It is informal because it’s not based on systematic review, and because some studies have been included with very relaxed conditions. For example, I have picked the novices and experts without comparing their definitions between studies. Novice = weakest skill group in the study, expert = strongest skill group in the study. If a study included more than 2 groups, I picked the weakest (=novice) and strongest (=expert) groups’ results and discarded the others. If a study included more than 1 task, or several sub-tasks, I picked the one with largest difference between groups.

Many papers did report means and standard deviations explicitly, so they had to be estimated from boxplots/barplots, or by some other means

For example, sometimes studies reported only mean or median, but no SE/SD. I estimated the SD/SE in those cases based e.g. on the SD of some other similar metric that they reported, or the SD of previous results for the same metric. See the excel file for notes on each study.

May or may not be turned into more systematic meta-analysis later.

Example metrics that will be most likely included (Bolded ones have priority)

  • Task time
  • Tool Path length
  • Tool Velocity
  • Tool Acceleration
  • Tool Curvature
  • Idle time
  • Pupil dilations
  • Blinks
  • Tool Movement efficiency
  • Number of movements
  • Tool Forces
  • Tool Torques
  • Bimanual dexterity
  • Jerk
  • Fixation duration
  • Saccade amplitudes
  • EEG?
  • Surgical Evaluation Instruments (SEI)

Full list of papers and metrics can be found in the excel file shared in the repo:

Link to Github repo

Last update: 19.7.2022.: Added more results. Changed Laparoscopy -> Endoscopy, so all endoscopic procedures are labeled ‘endoscopy’

Submit results

If you notice errors or know some good studies to be included, feel free to forward them to

jani.koskinen [ at ] uef.fi

or use the form below TBD

How results are calculated

  1. From each study, extract
  • Number of trials per group (Nn, Ne, for novices and experts, respectively)
  • Means per group (Mn, Me for novices and experts, respectively)
  • Standard deviations per group (SDn, SDe)
  1. Calculate pooled standard deviation SDpooled
  2. Normalize by calculating Standardized Mean Difference (SDM): (Mn - Me)/SDpooled
  3. Calculate small sample size correction g = SMD*(1 - 3/(4n - 9)), where n is the total sample size of the study (both groups combined).
  4. Calculate SDg, standard deviation after correction

These values are used as input in the R meta package’s metagen function.

For more information, check:

Doing Meta-Analysis with R: A Hands-On Guide

Forest plots

Forest plot explanation

Sample size estimation (Work in progress)

How many samples needed at some effect size d? At alpha = 0.05 and power = 0.8 and using t-test. Assuming independent trials (e.g. no multiple measurements from same participants etc.)

Hover mouse over the points in the plot to see the values. Sample size is for group, so you need this many samples per group

Some baseline effect sizes from the meta-analyses given as baseline:

IT = Idle Time

TT = Task Time

BD = Bimanual Dexterity

TEPR = Task-Evoked Pupil Reaction/Dilation (Esimated without one outlier study removed)

TJ = Tool Jerk

TF = Tool Force

Task Time

Task time is the time taken to complete a task. Task can be short like a single knot or some longer complex task.

Studies

Load data

df.time <- read_excel('data/surgical_metrics.xlsx', sheet='task_time')

Print studies

Author Year Study Journal Note
Koskinen et al. 2022 Utilizing Grasp Monitoring to Predict Microsurgical Expertise Journal of Surgical Research NA
Chainey et al. 2021 Eye-Hand Coordination of Neurosurgeons: Evidence of Action-Related Fixation in Microsuturing World Neurosurgery NA
Harada et al. 2015 Assessing microneurosurgical skill with medico-engineering technology World Neurosurgery effects estimated from boxplot
Vedula et al. 2016 Task-Level vs. Segment-Level Quantitative Metrics for Surgical Skill Assessment Journal of Surgical Education effects estimated from barplot. Sample size per group not given, estimated from total sample (135 trials total, 4 experts, 14 novices, expert sample size rounded from (4/18)*135)
Judkins et al. 2009 Objective evaluation of expert and novice performance during robotic surgical training tasks Surgical Endoscopy effect estimated from barplot. Novices pre-training, three trials each, five novices and five experts
Smith et al. 2002 Motion analysis: A tool for assessing laparoscopic dexterity in the performance of a laboratory-based laparoscopic cholecystectomy Surgical Endoscopy and Other Interventional Techniques Worst and best groups compared, novices have performed < tasks, experts >100
Francis et al. 2002 The performance of master surgeons on the Advanced Dundee Endoscopic Psychomotor Tester: Contrast validity study Archives of Surgery effects estimated from boxplots
Moorthy et al. 2004 Bimodal assessment of laparoscopic suturing skills: Construct and concurrent validity Surgical Endoscopy and Other Interventional Techniques NA
Van Sickle et al. 2008 Construct validity of an objective assessment method for laparoscopic intracorporeal suturing and knot tying The American Journal of Surgery the expert group had only 2 trials, and outperformed the other groups vastly (task time 15.6 sec!). Thus I compared instead the trained residents (second most experiened group)
Xeroulis et al. 2009 Simulation in laparoscopic surgery: A concurrent validity study for FLS Surgical Endoscopy and Other Interventional Techniques effect sizes estimated from barplot
Huffman et al. 2020 Optimizing Assessment of Surgical Knot Tying Skill Journal of Surgical Education By hand, did not use instruments
Law et al. 2004 Eye gaze patterns differentiate novice and experts in a virtual laparoscopic surgery training environment Proceedings of the Eye tracking research & applications symposium on Eye tracking research & applications - ETRA’2004 NA
Kazemi et al. 2010 Assessing suturing techniques using a virtual reality surgical simulator Microsurgery task completed in VR. Medical students and medical surgeons compared. Times estimated from barplot
O’Toole et al. 1999 Measuring and Developing Suturing Technique with a Virtual Reality Surgical Simulator Journal ofthe American College of Surgeons Virtual reality, times from the trial taken after training
Zheng et al. 2021 Action-related eye measures to assess surgical expertise BJS Open Transporting and loading task
Datta et al. 2001 The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model Journal of the American College of Surgeons Used ICSAD system to record data. Several skill groups, here we compare basic surgical trainees and consultants
Pagador et al. 2012 Decomposition and analysis of laparoscopic suturing task using tool-motion analysis (TMA): Improving the objective assessment International Journal of Computer Assisted Radiology and Surgery First subtask results
Aggarwal et al. 2007 An evaluation of the feasibility, validity, and reliability of laparoscopic skills assessment in the operating room Annals of Surgery Whole procedure, paper reports medians and inter-quartile ranges, the SDs are calculated from these (IQR*(3/4))
Wilson et al. 2010 Psychomotor control in a virtual laparoscopic surgery training environment: Gaze control parameters differentiate novices from experts Surgical Endoscopy NA
Hofstad et al. 2013 A study of psychomotor skills in minimally invasive surgery: What differentiates expert and nonexpert performance Surgical Endoscopy and Other Interventional Techniques Estimated effects and SDs from barplots. Reports left/right hand separately, I used left hand results
Hung et al. 2018 Development and Validation of Objective Performance Metrics for Robot-Assisted Radical Prostatectomy: A Pilot Study Journal of Urology Values given as mean and 95% conf interval. SD calculated from conf interval by sqrt(N)*(upper lim - lower lim)/3.92
Yamaguchi et al. 2011 Objective assessment of laparoscopic suturing skills using a motion-tracking system Surgical Endoscopy Results for the whole procedure
Pellen et al. 2009 Laparoscopic surgical skills assessment: Can simulators replace experts? World Journal of Surgery Estimated effects and SDs from boxplots.
Pastewski et al. 2021 Analysis of Instrument Motion and the Impact of Residency Level and Concurrent Distraction on Laparoscopic Skills Journal of Surgical Education Used results for without secondary task
Chmarra et al. 2010 Objective classification of residents based on their psychomotor laparoscopic skills Surgical Endoscopy and Other Interventional Techniques Values estimated from plots, used the pipe cleaner task results.
Rittenhouse et al. 2014 Design and validation of an assessment tool for open surgical procedures Surgical Endoscopy Used Wii (IR sensor) and Patrio EM tracking. Results are for the Patriot tracking system. Values estimated from barplot (Fig. 6)
Mackenzie et al. 2021 Enhanced Training Benefits of Video Recording Surgery With Automated Hand Motion Analysis World Journal of Surgery Values given as means and ranges. Compared experts and residents post-training. SD for idle time not given, estimated from variance of total active time.
Mazomenos et al. 2016 Catheter manipulation analysis for objective performance and technical skills assessment in transcatheter aortic valve implantation International Journal of Computer Assisted Radiology and Surgery Task was performed with conventional tools and with robotic tools. Results are for conventional tools. There were 2 stages, results here are for stage 1. SDs evaluated from boxplots (Fig. 5). Expert jerk weirdly small?
Amiel et al. 2020 Experienced surgeons versus novice surgery residents: Validating a novel knot tying simulator for vessel ligation Surgery 4 different knot types, each completed twice. 15 experts and 30 novices. Results are for the deep two hand knot (Fig. 2). Effects estimated from the plot, for Total Force.
Balasundaram et al. 2022 Acquisition of microvascular suturing techniques is feasible using objective measures of performance outside of the operating room British Journal of Oral and Maxillofacial Surgery Results for novices are for post-intervention (training), fig 5. Effects estimated from the figure.
Franco-González et al. 2021 Development of a 3D Motion Tracking System for the Analysis of Skills in Microsurgery Journal of Medical Systems Values are for the suturing task
Berges et al. 2022 Eye Tracking and Motion Data Predict Endoscopic Sinus Surgery Skill Laryngoscope Participants completed 9 tasks. Results are for total time
Saleh et al. 2006 Evaluating surgical dexterity during corneal suturing Archives of Ophthalmology Values given as medians and inter-quartile ranges. Values are for novice and expet surgeons (Table)
Balal et al. 2019 Computer analysis of individual cataract surgery segments in the operating room Eye (Basingstoke) Results from Table 1 for CCC

Results

m.time <- metagen(TE=g,
                 seTE=SDg,
                 studlab=Author,
                 data=df.time,
                 sm="SMD",
                 fixed=FALSE,
                 random=TRUE,
                 method.tau="REML",
                 hakn=TRUE,
                 title="Time to completion in Surgery")
summary(m.time)
## Review:     Time to completion in Surgery
## 
##                           SMD             95%-CI %W(random)
## Koskinen et al.        1.8413 [ 1.4135;  2.2691]        3.4
## Chainey et al.         0.7034 [ 0.0383;  1.3686]        3.3
## Harada et al.          1.5503 [ 0.8551;  2.2456]        3.3
## Vedula et al.          2.2149 [ 1.7299;  2.6999]        3.4
## Judkins et al.         5.3971 [ 3.8216;  6.9726]        2.7
## Smith et al.           8.0559 [ 5.5551; 10.5568]        2.0
## Francis et al.         0.9801 [ 0.3227;  1.6375]        3.3
## Moorthy et al.         1.4157 [ 0.3397;  2.4917]        3.1
## Van Sickle et al.      2.1365 [ 1.0202;  3.2528]        3.0
## Xeroulis et al.        2.5525 [ 1.2650;  3.8400]        2.9
## Huffman et al.         6.5116 [ 4.9174;  8.1059]        2.7
## Law et al.             2.0257 [ 1.3401;  2.7112]        3.3
## Kazemi et al.          0.8354 [-0.3084;  1.9791]        3.0
## O'Toole et al.         1.7086 [ 0.6569;  2.7602]        3.1
## Zheng et al.           1.9382 [ 0.6894;  3.1871]        2.9
## Datta et al.           2.1791 [ 1.1762;  3.1819]        3.1
## Pagador et al.         6.3695 [ 2.5221; 10.2170]        1.3
## Aggarwal et al.        0.1873 [-0.4390;  0.8136]        3.3
## Wilson et al.          1.3349 [ 0.1520;  2.5179]        3.0
## Hofstad et al.         1.3791 [ 0.3199;  2.4382]        3.1
## Hung et al.            2.2342 [ 1.7203;  2.7481]        3.4
## Yamaguchi et al.       4.5870 [ 2.7625;  6.4116]        2.5
## Pellen et al.          5.6362 [ 3.6128;  7.6596]        2.4
## Pastewski et al.       0.5600 [-0.1171;  1.2371]        3.3
## Chmarra et al.         0.9810 [ 0.0706;  1.8914]        3.2
## Rittenhouse et al.     3.9042 [ 2.1253;  5.6832]        2.5
## Mackenzie et al.       0.5613 [-1.5135;  2.6361]        2.3
## Mazomenos et al.       2.3487 [ 0.8266;  3.8708]        2.7
## Amiel et al.           1.8403 [ 1.3249;  2.3556]        3.4
## Balasundaram et al.    0.8384 [-0.0792;  1.7559]        3.2
## Franco-González et al. 1.7391 [ 0.4474;  3.0309]        2.9
## Berges et al.          1.0838 [ 0.7576;  1.4101]        3.4
## Saleh et al.           4.7016 [ 2.9459;  6.4574]        2.6
## Balal et al.           1.6436 [ 0.9231;  2.3642]        3.3
## 
## Number of studies combined: k = 34
## 
##                         SMD           95%-CI    t  p-value
## Random effects model 2.2474 [1.6263; 2.8684] 7.36 < 0.0001
## 
## Quantifying heterogeneity:
##  tau^2 = 2.3052 [1.5556; 5.6252]; tau = 1.5183 [1.2472; 2.3717]
##  I^2 = 84.4% [79.1%; 88.3%]; H = 2.53 [2.19; 2.93]
## 
## Test of heterogeneity:
##       Q d.f.  p-value
##  211.46   33 < 0.0001
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model

Plot forest

forest.meta(m.time,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Time to completion in Surgery")

#dev.print(pdf, "figures/forest_time.pdf", width=10, height=10)

Meta-regression (testing)

With enough results, we can do regression analysis to compare e.g. how the effect sizes differed between surgical techniques.

First, plot by surgical technique (red labels show the number of studies):

n_obs <- function(x){
  return(c(y=0, label=length(x)))
}
ggplot(df.time, aes(x=Technique, y=SMD)) + geom_boxplot() + stat_summary(fun.data = n_obs, colour = "red", size = 5, geom = "text")

Fit linear model with Technique as explanatory variable. Microsurgery effect size is used as baseline (intercept).

df.time$Technique <- as.factor(df.time$Technique)
df.time <- within(df.time, Technique <- relevel(Technique, ref="Microsurgery"))
lm.time <- lm(SMD ~ Technique, data=df.time)
summary(lm.time)
## 
## Call:
## lm(formula = SMD ~ Technique, data = df.time)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5495 -1.2221 -0.6900  0.0604  5.6037 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                1.7962     0.7491   2.398   0.0229 *
## TechniqueEndoscopy         0.9438     0.8930   1.057   0.2990  
## TechniqueOpen surgery      1.1377     1.2974   0.877   0.3875  
## TechniqueRobotic Surgery   1.5461     1.4344   1.078   0.2897  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.119 on 30 degrees of freedom
## Multiple R-squared:  0.05368,    Adjusted R-squared:  -0.04095 
## F-statistic: 0.5672 on 3 and 30 DF,  p-value: 0.6409

Discussion

Time to completion is by far the most often reported metric. It is often reported even when it is not the main focus of the study.

Bimanual Dexterity

Bimanual dexterity is a measure of how well the surgeon is able to use both hands at the same time. Note that there are many different ways for calculating “ability to use both hands simultaneously.”

Studies

Load data

df.biman <- read_excel('data/surgical_metrics.xlsx', sheet='tool_bimanual')

Print studies

Author Year Study Journal Note
Koskinen et al. 2022 Movement-level process modeling of microsurgical bimanual and unimanual tasks International Journal of Computer Assisted Radiology and Surgery Bimanual efficiency defined as using both hand simultaneously for something productive
Hofstad et al. 2017 Psychomotor skills assessment by motion analysis in minimally invasive surgery on an animal organ Minimally Invasive Therapy and Allied Technologies Bimanual dexterity defined as the correlation between the two hands tool movements. Values estimated from boxplots
Demirel et al. 2022 Scoring metrics for assessing skills in arthroscopic rotator cuff repair: performance comparison study of novice and expert surgeons International Journal of Computer Assisted Radiology and Surgery Standard deviations estimated from the standard deviations of other metrics, not given directly in the paper
Islam et al. 2016 Affordable, web-based surgical skill training and evaluation tool Journal of Biomedical Informatics Mean values estimated from boxplot. Standard deviations were not given, I used the similar-ish values as in our study (i = 0), so novice’s SD is about 1/5 of the mean, experts is 1/12
Zulbaran-Rojas et al. 2021 Utilization of Flexible-Wearable Sensors to Describe the Kinematics of Surgical Proficiency Journal of Surgical Research I took the ratio of number of dominant and non-dominant hand movements as measure of bimanual dexterity. Other options were velocity and path length. No. Movements felt closest to our definition.
Mori et al. 2022 Validation of a novel virtual reality simulation system with the focus on training for surgical dissection during laparoscopic sigmoid colectomy BMC Surgery Bimanual dexterity measured in GOALS score (see paper for more information). Results given as medians and inter-quartile ranges. SD calculated from IQR as SD = IQR*(3/4)
Franco-González et al. 2021 Development of a 3D Motion Tracking System for the Analysis of Skills in Microsurgery Journal of Medical Systems Values are for the suturing task

Results

Run meta-analysis

m.biman <- metagen(TE=g,
                 seTE=SDg,
                 studlab=Author,
                 data=df.biman,
                 sm="SMD",
                 fixed=FALSE,
                 random=TRUE,
                 method.tau="REML",
                 hakn=TRUE,
                 title="Bimanual dexterity in Surgery")

Print results

summary(m.biman)
## Review:     Bimanual dexterity in Surgery
## 
##                            SMD              95%-CI %W(random)
## Koskinen et al.        -3.0589 [ -3.8825; -2.2353]       14.9
## Hofstad et al.         -3.0127 [ -4.6473; -1.3782]       13.7
## Demirel et al.         -2.0314 [ -3.0378; -1.0251]       14.7
## Islam et al.           -8.6969 [-10.7900; -6.6039]       12.7
## Zulbaran-Rojas et al.  -0.8250 [ -1.7586;  0.1085]       14.8
## Mori et al.            -2.6867 [ -3.6936; -1.6799]       14.7
## Franco-González et al. -1.2364 [ -2.4340; -0.0387]       14.4
## 
## Number of studies combined: k = 7
## 
##                          SMD             95%-CI     t p-value
## Random effects model -2.9716 [-5.3010; -0.6422] -3.12  0.0205
## 
## Quantifying heterogeneity:
##  tau^2 = 5.4276 [1.9426; 32.4354]; tau = 2.3297 [1.3938; 5.6952]
##  I^2 = 88.7% [79.2%; 93.9%]; H = 2.98 [2.19; 4.04]
## 
## Test of heterogeneity:
##      Q d.f.  p-value
##  53.15    6 < 0.0001
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model

Plot forest

forest.meta(m.biman,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Bimanual dexterity in Surgery")

#dev.print(pdf, "figures/forest_biman.pdf", width=8, height=8)

Discussion

Analysis of bimanual dexterity is made harder because there are so many different definitions for it.

Tool Movements

Number of tool movements made during the task. Note: I have included here the grasp results from our paper (and other studies that analyzed only one type of action/movement)

Studies

Load data

df.toolmvt <- read_excel('data/surgical_metrics.xlsx', sheet='tool_movements')

Print studies

Author Year Study Journal Note
Datta et al. 2001 The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratory-based model Journal of the American College of Surgeons Used ICSAD system to record data. Several skill groups, here we compare basic surgical trainees and consultants
Pagador et al. 2012 Decomposition and analysis of laparoscopic suturing task using tool-motion analysis (TMA): Improving the objective assessment International Journal of Computer Assisted Radiology and Surgery Study reported left and right hand movements separately, I picked left hand
Koskinen et al. 2022 Utilizing Grasp Monitoring to Predict Microsurgical Expertise Journal of Surgical Research grasps
Bann et al. 2003 Measurement of surgical dexterity using motion analysis of simple bench tasks World Journal of Surgery Used ICSAD system to record data. Reports medians and inter-quartile ranges.
Smith et al. 2002 Motion analysis: A tool for assessing laparoscopic dexterity in the performance of a laboratory-based laparoscopic cholecystectomy Surgical Endoscopy and Other Interventional Techniques Multiple tasks, picked Calot’s triangle. Surgeon groups A and C compared
Aggarwal et al. 2007 An evaluation of the feasibility, validity, and reliability of laparoscopic skills assessment in the operating room Annals of Surgery Whole procedure, paper reports medians and inter-quartile ranges, the SDs are calculated from these (IQR*(3/4))
Yamaguchi et al. 2007 Construct validity for eye-hand coordination skill on a virtual reality laparoscopic surgical simulator Surgical Endoscopy and Other Interventional Techniques Effects and SDs estimated from barplots. Reported right hand movements
Goldbraikh et al. 2021 Video-based fully automatic assessment of open surgery suturing skills International Journal of Computer Assisted Radiology and Surgery Task:Balloon dominant hand
Vedula et al. 2016 Task-Level vs . Segment-Level Quantitative Metrics for Surgical Skill Assessment Journal of Surgical Education Effects and SDs estimated from barplots. Paper does not give Ne/Nn directly, total of 135 trials performed by 14 novices and 4 experts, so I estimated sample sizes by 135(14/(14+4)) for novices and 135(4/(14+4)) for experts
Wilson et al. 2010 Psychomotor control in a virtual laparoscopic surgery training environment: Gaze control parameters differentiate novices from experts Surgical Endoscopy reported left and right hand separately, I used left hand because usually differences are larger with non-dominant hand (all were right-handed)
Hofstad et al. 2013 A study of psychomotor skills in minimally invasive surgery: What differentiates expert and nonexpert performance Surgical Endoscopy and Other Interventional Techniques Estimated effects and SDs from barplots. Reports left/right hand separately, I used left hand results
Rittenhouse et al. 2014 Design and validation of an assessment tool for open surgical procedures Surgical Endoscopy Used Wii (IR sensor) and Patrio EM tracking. Results are for the Patriot tracking system. Values estimated from barplot (Fig. 6)
Balasundaram et al. 2022 Acquisition of microvascular suturing techniques is feasible using objective measures of performance outside of the operating room British Journal of Oral and Maxillofacial Surgery Results for novices are for post-intervention (training), fig 5. Effects estimated from the figure.
Franco-González et al. 2021 Development of a 3D Motion Tracking System for the Analysis of Skills in Microsurgery Journal of Medical Systems Values are for the suturing task
Saleh et al. 2006 Evaluating surgical dexterity during corneal suturing Archives of Ophthalmology Values given as medians and inter-quartile ranges. Values are for novice and expet surgeons (Table)
Balal et al. 2019 Computer analysis of individual cataract surgery segments in the operating room Eye (Basingstoke) Results from Table 1 for CCC

Results

Run meta-analysis

m.toolmvt <- metagen(TE=g,
                 seTE=SDg,
                 studlab=Author,
                 data=df.toolmvt,
                 sm="SMD",
                 fixed=FALSE,
                 random=TRUE,
                 method.tau="REML",
                 hakn=TRUE,
                 title="Tool movements in Surgery")
summary(m.toolmvt)
## Review:     Tool movements in Surgery
## 
##                            SMD             95%-CI %W(random)
## Datta et al.            2.0390 [ 1.0607;  3.0174]        6.6
## Pagador et al.         10.0866 [ 4.2364; 15.9368]        1.9
## Koskinen et al.         1.3393 [ 0.7781;  1.9006]        7.0
## Bann et al.             1.2504 [ 0.4629;  2.0380]        6.8
## Smith et al.            5.9403 [ 4.0136;  7.8671]        5.5
## Aggarwal et al.        -0.0641 [-0.6894;  0.5612]        6.9
## Yamaguchi et al.        2.3074 [ 1.3887;  3.2260]        6.7
## Goldbraikh et al.       2.6143 [ 1.8535;  3.3751]        6.8
## Vedula et al.           6.5233 [ 5.6418;  7.4047]        6.7
## Wilson et al.           0.5955 [-0.4889;  1.6799]        6.5
## Hofstad et al.          0.9586 [-0.0444;  1.9617]        6.6
## Rittenhouse et al.      4.2589 [ 2.3738;  6.1439]        5.6
## Balasundaram et al.     2.0810 [ 0.9757;  3.1864]        6.5
## Franco-González et al.  1.5547 [ 0.3003;  2.8091]        6.3
## Saleh et al.            1.9550 [ 0.8741;  3.0360]        6.5
## Balal et al.            1.5181 [ 0.8115;  2.2248]        6.9
## 
## Number of studies combined: k = 16
## 
##                         SMD           95%-CI    t p-value
## Random effects model 2.4092 [1.2764; 3.5421] 4.53  0.0004
## 
## Quantifying heterogeneity:
##  tau^2 = 3.3007 [1.8235; 13.0443]; tau = 1.8168 [1.3504; 3.6117]
##  I^2 = 92.3% [89.1%; 94.6%]; H = 3.61 [3.03; 4.30]
## 
## Test of heterogeneity:
##       Q d.f.  p-value
##  195.00   15 < 0.0001
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model

Plot forest

forest.meta(m.toolmvt,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Tool movements in Surgery")

#dev.print(pdf, "figures/forest_toolmvt.pdf", width=8, height=8)

Discussion

Tool movements are perhaps the second most often reported metric. Different papers measure, analyze and report them differently. Often connected to “movement efficiency”.

Tool Idle Time

Tool idle time measures how long the tools were not being used, either as time or as fraction of the complete task time.

Studies

Load data

df.toolidle <- read_excel('data/surgical_metrics.xlsx', sheet='tool_idle')

Print studies

Author Year Study Journal Note
Koskinen et al. 2021 Movement-level process modeling of microsurgical bimanual and unimanual tasks International Journal of Computer Assisted Radiology and Surgery Reports left/right hand separately, results are for left-hand. Paper reports suturing efficiency, which is the inverse of idle time (idle time = 1 - efficiency)
Uemura et al. 2015 Procedural surgical skill assessment in laparoscopic training environments International Journal of Computer Assisted Radiology and Surgery Reports left/right hand separately, results are for left-hand. Max time given as 420s, every novice exceeded this.
D’Angelo et al. 2015 Idle time: An underdeveloped performance metric for assessing surgical skill American Journal of Surgery Does not report idle time directly per skill group, only number of idle periods. Took values from the first segment, entering tissue with needle. Did not report SD for idle periods, estimated it from the SD of total operative time: SD_idle = M_idle*(SD_time/M_time).
Mackenzie et al. 2021 Enhanced Training Benefits of Video Recording Surgery With Automated Hand Motion Analysis World Journal of Surgery Values given as means and ranges. Compared experts and residents post-training. SD for idle time not given, estimated from variance of total active time.
Oropesa et al. 2013 Relevance of Motion-Related Assessment Metrics in Laparoscopic Surgery Surgical Innovation Means and SDs estimated from boxplots. Reports dominant and non-dominant hand separately, I picked non-dominant hand. Results for Coordinated pulling task.
Hung et al. 2018 Development and Validation of Objective Performance Metrics for Robot-Assisted Radical Prostatectomy: A Pilot Study Journal of Urology Values given as mean and 95% conf interval. SD calculated from conf interval by sqrt(N)*(upper lim - lower lim)/3.92
Topalli et al. 2018 Eye-Hand Coordination Patterns of Intermediate and Novice Surgeons in a Simulation-Based Endoscopic Surgery Training Environment Journal of Eye Movement Research Reports “Stand still duration”, which measures the time when tools were still. Corresponds roughly to idle time. Compares novices and intermediates

Results

Run meta-analysis

m.toolidle <- metagen(TE=g,
                 seTE=SDg,
                 studlab=Author,
                 data=df.toolidle,
                 sm="SMD",
                 fixed=FALSE,
                 random=TRUE,
                 method.tau="REML",
                 hakn=TRUE,
                 title="Idle time in Surgery")
summary(m.toolidle)
## Review:     Idle time in Surgery
## 
##                     SMD            95%-CI %W(random)
## Koskinen et al.  2.5600 [ 1.8069; 3.3131]       17.7
## Uemura et al.    1.6933 [ 0.7816; 2.6050]       16.2
## D'Angelo et al.  2.7724 [ 1.0363; 4.5085]        9.3
## Mackenzie et al. 0.3642 [-1.6449; 2.3734]        7.7
## Oropesa et al.   0.9556 [-0.1828; 2.0941]       14.0
## Hung et al.      0.6990 [ 0.2837; 1.1144]       20.8
## Topalli et al.   0.6159 [-0.4828; 1.7147]       14.3
## 
## Number of studies combined: k = 7
## 
##                         SMD           95%-CI    t p-value
## Random effects model 1.3803 [0.5279; 2.2326] 3.96  0.0074
## 
## Quantifying heterogeneity:
##  tau^2 = 0.5520 [0.0915; 4.0820]; tau = 0.7430 [0.3024; 2.0204]
##  I^2 = 75.3% [47.7%; 88.3%]; H = 2.01 [1.38; 2.93]
## 
## Test of heterogeneity:
##      Q d.f. p-value
##  24.29    6  0.0005
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model

Plot forest

forest.meta(m.toolidle,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Idle time in Surgery")

#dev.print(pdf, "figures/forest_toolidle.pdf", width=8, height=8)

Discussion

Not many papers that focused on idle time.

Tool Path Length

How much the tools travel during the task.

Studies

Load data

df.toolpl <- read_excel('data/surgical_metrics.xlsx', sheet='tool_path_length')

Print studies

Author Year Study Journal Note
Aggarwal et al. 2007 An evaluation of the feasibility, validity, and reliability of laparoscopic skills assessment in the operating room Annals of Surgery Whole procedure, paper reports medians and inter-quartile ranges, the SDs are calculated from these (IQR*(3/4))
Moorthy et al. 2004 Bimodal assessment of laparoscopic suturing skills: Construct and concurrent validity Surgical Endoscopy and Other Interventional Techniques box trainer
Smith et al. 2002 Motion analysis: A tool for assessing laparoscopic dexterity in the performance of a laboratory-based laparoscopic cholecystectomy Surgical Endoscopy and Other Interventional Techniques Multiple tasks, picked Calot’s triangle. Surgeon groups A and C compared
Pagador et al. 2012 Decomposition and analysis of laparoscopic suturing task using tool-motion analysis (TMA): Improving the objective assessment International Journal of Computer Assisted Radiology and Surgery Study reported left and right hand movements separately, I picked left hand
Goldbraikh et al. 2021 Video-based fully automatic assessment of open surgery suturing skills International Journal of Computer Assisted Radiology and Surgery Task:Balloon dominant hand
Jimbo et al. 2017 A new innovative laparoscopic fundoplication training simulator with a surgical skill validation system Surgical Endoscopy Estimated effects and SDs from barplots. Reports left/right hand separately, I used left hand results
Hofstad et al. 2013 A study of psychomotor skills in minimally invasive surgery: What differentiates expert and nonexpert performance Surgical Endoscopy and Other Interventional Techniques Estimated effects and SDs from barplots. Reports left/right hand separately, I used left hand results
Oropesa et al. 2013 Relevance of Motion-Related Assessment Metrics in Laparoscopic Surgery Surgical Innovation Means and SDs estimated from boxplots. Reports dominant and non-dominant hand separately, I picked non-dominant hand. Results for Coordinated pulling task.
Pellen et al. 2009 Laparoscopic surgical skills assessment: Can simulators replace experts? World Journal of Surgery Values estimated from boxplots
D’Angelo et al. 2015 Idle time: An underdeveloped performance metric for assessing surgical skill American Journal of Surgery NA
Hung et al. 2018 Development and Validation of Objective Performance Metrics for Robot-Assisted Radical Prostatectomy: A Pilot Study Journal of Urology Values given as mean and 95% conf interval. SD calculated from conf interval by sqrt(N)*(upper lim - lower lim)/3.92. Results for non-dominant hand
Vedula et al. 2016 Task-Level vs . Segment-Level Quantitative Metrics for Surgical Skill Assessment Journal of Surgical Education Effects and SDs estimated from barplots. Paper does not give Ne/Nn directly, total of 135 trials performed by 14 novices and 4 experts, so I estimated sample sizes by 135(14/(14+4)) for novices and 135(4/(14+4)) for experts
Yamaguchi et al. 2011 Objective assessment of laparoscopic suturing skills using a motion-tracking system Surgical Endoscopy Used results for left hand, for the whole procedure
Harada et al. 2015 Assessing Microneurosurgical Skill with Medico-Engineering Technology World Neurosurgery Results for left hand, estimated from boxplot
Ebina et al. 2021 Motion analysis for better understanding of psychomotor skills in laparoscopy: objective assessment-based simulation training using animal organs Surgical Endoscopy Results for needle holder (left hand), from task 3, knot tying and suturing. Results given in paper as medians and inter-quartile ranges
Chmarra et al. 2010 Objective classification of residents based on their psychomotor laparoscopic skills Surgical Endoscopy and Other Interventional Techniques Values estimated from plots, used the pipe cleaner task results.
Rittenhouse et al. 2014 Design and validation of an assessment tool for open surgical procedures Surgical Endoscopy Used Wii (IR sensor) and Patrio EM tracking. Results are for the Patriot tracking system. Values estimated from barplot (Fig. 6)
Balasundaram et al. 2022 Acquisition of microvascular suturing techniques is feasible using objective measures of performance outside of the operating room British Journal of Oral and Maxillofacial Surgery Results for novices are for post-intervention (training), fig 5. Effects estimated from the figure.
Glarner et al. 2014 Quantifying technical skills during open operations using video-based motion analysis Surgery Effects and SDs estimated from fig 3. Effects are for non-dominant hand (ND). The task was split into four sub-tasks, resultsh ere are for suturing, C. Six patients operated on, novice and expert performed the same operation in parallel
Zhenzhu et al. 2020 Feasibility Study of the Low-Cost Motion Tracking System for Assessing Endoscope Holding Skills World Neurosurgery Values estimated from boxplots (Fig. 6), for the 0’ setup.
Franco-González et al. 2021 Development of a 3D Motion Tracking System for the Analysis of Skills in Microsurgery Journal of Medical Systems Values are for the suturing task
Berges et al. 2022 Eye Tracking and Motion Data Predict Endoscopic Sinus Surgery Skill Laryngoscope Participants completed 9 tasks. Results are for total distance in tracker units
Saleh et al. 2006 Evaluating surgical dexterity during corneal suturing Archives of Ophthalmology Values given as medians and inter-quartile ranges. Values are for novice and expet surgeons (Table)
Balal et al. 2019 Computer analysis of individual cataract surgery segments in the operating room Eye (Basingstoke) Results from Table 1 for CCC

Results

Run meta-analysis

m.toolpl <- metagen(TE=g,
                 seTE=SDg,
                 studlab=Author,
                 data=df.toolpl,
                 sm="SMD",
                 fixed=FALSE,
                 random=TRUE,
                 method.tau="REML",
                 hakn=TRUE,
                 title="Tool path length in Surgery")
summary(m.toolpl)
## Review:     Tool path length in Surgery
## 
##                            SMD             95%-CI %W(random)
## Aggarwal et al.         0.0647 [-0.5606;  0.6900]        5.2
## Moorthy et al.          1.3161 [ 0.2542;  2.3780]        4.1
## Smith et al.            2.6541 [ 1.5194;  3.7889]        3.9
## Pagador et al.          6.2865 [ 2.4827; 10.0904]        0.9
## Goldbraikh et al.       2.0174 [ 1.3325;  2.7024]        5.0
## Jimbo et al.            0.8695 [ 0.1950;  1.5441]        5.1
## Hofstad et al.          0.7989 [-0.1876;  1.7853]        4.3
## Oropesa et al.          0.2889 [-0.8108;  1.3885]        4.0
## Pellen et al.           2.0960 [ 0.9877;  3.2042]        4.0
## D'Angelo et al.         1.7506 [ 0.3193;  3.1819]        3.3
## Hung et al.             1.8000 [ 1.3220;  2.2779]        5.5
## Vedula et al.           2.4204 [ 1.9215;  2.9194]        5.5
## Yamaguchi et al.        3.3661 [ 1.8874;  4.8449]        3.2
## Harada et al.           1.0214 [ 0.3743;  1.6685]        5.1
## Ebina et al.            0.9071 [ 0.1861;  1.6281]        5.0
## Chmarra et al.          1.2076 [ 0.2706;  2.1447]        4.4
## Rittenhouse et al.      2.9945 [ 1.4708;  4.5183]        3.1
## Balasundaram et al.     1.2770 [ 0.3080;  2.2460]        4.3
## Glarner et al.         -0.7531 [-1.9308;  0.4246]        3.8
## Zhenzhu et al.          4.1986 [ 1.6136;  6.7836]        1.6
## Franco-González et al.  1.6536 [ 0.3795;  2.9276]        3.6
## Berges et al.           0.6704 [ 0.3563;  0.9845]        5.8
## Saleh et al.            1.7083 [ 0.6720;  2.7445]        4.2
## Balal et al.            1.2823 [ 0.5994;  1.9652]        5.1
## 
## Number of studies combined: k = 24
## 
##                         SMD           95%-CI    t  p-value
## Random effects model 1.4605 [1.0056; 1.9153] 6.64 < 0.0001
## 
## Quantifying heterogeneity:
##  tau^2 = 0.6387 [0.3758; 2.6781]; tau = 0.7992 [0.6130; 1.6365]
##  I^2 = 79.0% [69.4%; 85.6%]; H = 2.18 [1.81; 2.64]
## 
## Test of heterogeneity:
##       Q d.f.  p-value
##  109.74   23 < 0.0001
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model

Plot forest

forest.meta(m.toolpl,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Tool path length in Surgery")

#dev.print(pdf, "figures/forest_toolpl.pdf", width=8, height=8)

Discussion

Tool path length also a very common metric. Most studies report that novices have much larger path length, indicating less effective movements. Results differ based on task and surgical

Tool Velocity

Tool velocity/speed measures how fast the surgical tool or tools are moving.

Studies

Load data

df.toolvelocity <- read_excel('data/surgical_metrics.xlsx', sheet='tool_velocity')

Print studies

Author Year Study Journal Note
Davids et al. 2021 Automated vision-based microsurgical skill analysis in neurosurgery using deep learning: Development and preclinical validation. World Neurosurgery Values given as medians
Pastewski et al. 2021 Analysis of Instrument Motion and the Impact of Residency Level and Concurrent Distraction on Laparoscopic Skills Journal of Surgical Education Junior and Senior residents. Did task with and without secondary task (to add distractions). Velocity was reported for three degrees of freedom of motion (yaw, pitch, roll). Results here are for Roll and NO secondary task.
Hwang et al. 2006 Correlating motor performance with surgical error in laparoscopic cholecystectomy Surgical Endoscopy and Other Interventional Techniques NA
Ebina et al. 2021 Motion analysis for better understanding of psychomotor skills in laparoscopy: objective assessment-based simulation training using animal organs Surgical Endoscopy Results for needle holder (left hand), from task 3, knot tying and suturing. Results given in paper as medians and inter-quartile ranges
Jimbo et al. 2017 A new innovative laparoscopic fundoplication training simulator with a surgical skill validation system Surgical Endoscopy Estimated effects and SDs from barplots. Reports left/right hand separately, I used left hand results
Judkins et al. 2009 Objective evaluation of expert and novice performance during robotic surgical training tasks Surgical Endoscopy Estimated effects and SDs from barplots. Compared experts and novices post-training. Results are for bimanual carrying task, which was repeated 3 times by each participant (5 novices 5 experts)
Hofstad et al. 2013 A study of psychomotor skills in minimally invasive surgery: What differentiates expert and nonexpert performance Surgical Endoscopy and Other Interventional Techniques Estimated effects and SDs from barplots. Reports left/right hand separately, I used left hand results
Frasier et al. 2016 A marker-less technique for measuring kinematics in the operating room Surgery (United States) Gives values for grand average and by different tasks. I used grand average results.
Azari et al. 2018 Can surgical performance for varying experience be measured from hand motions? Proceedings of the Human Factors and Ergonomics Society Did not report SDs for motion metrics. I estimated SD from the subjective grading fluidity of motion SD, so acceleration SD = acceleration Mean * (grade SD/grade Mean).
Pagador et al. 2012 Decomposition and analysis of laparoscopic suturing task using tool-motion analysis (TMA): Improving the objective assessment International Journal of Computer Assisted Radiology and Surgery Study reported left and right hand movements separately, I picked left hand. First subtask
Hung et al. 2018 Development and Validation of Objective Performance Metrics for Robot-Assisted Radical Prostatectomy: A Pilot Study Journal of Urology Values given as mean and 95% conf interval. SD calculated from conf interval by sqrt(N)*(upper lim - lower lim)/3.92. Results for non-dominant hand
Mazomenos et al. 2016 Catheter manipulation analysis for objective performance and technical skills assessment in transcatheter aortic valve implantation International Journal of Computer Assisted Radiology and Surgery Task was performed with conventional tools and with robotic tools. Results are for conventional tools. There were 2 stages, results here are for stage 1. SDs evaluated from boxplots (Fig. 5). Expert jerk weirdly small?
Glarner et al. 2014 Quantifying technical skills during open operations using video-based motion analysis Surgery Effects and SDs estimated from fig 3. Effects are for non-dominant hand (ND). The task was split into four sub-tasks, resultsh ere are for suturing, C. Six patients operated on, novice and expert performed the same operation in parallel
Zhenzhu et al. 2020 Feasibility Study of the Low-Cost Motion Tracking System for Assessing Endoscope Holding Skills World Neurosurgery Values estimated from boxplots (Fig. 6), for the 0’ setup.
Franco-González et al. 2021 Development of a 3D Motion Tracking System for the Analysis of Skills in Microsurgery Journal of Medical Systems Values are for the suturing task
Berges et al. 2022 Eye Tracking and Motion Data Predict Endoscopic Sinus Surgery Skill Laryngoscope Participants completed 9 tasks. Results are for average velocity in tracker units

Results

Run meta-analysis

m.toolvelocity <- metagen(TE=g,
                 seTE=SDg,
                 studlab=Author,
                 data=df.toolvelocity,
                 sm="SMD",
                 fixed=FALSE,
                 random=TRUE,
                 method.tau="REML",
                 hakn=TRUE,
                 title="Tool velocity in Surgery")
summary(m.toolvelocity)
## Review:     Tool velocity in Surgery
## 
##                            SMD             95%-CI %W(random)
## Davids et al.           0.5140 [-1.5370;  2.5650]        3.9
## Pastewski et al.       -0.7177 [-1.4028; -0.0326]        7.5
## Hwang et al.            6.1176 [ 1.5045; 10.7307]        1.2
## Ebina et al.           -0.8684 [-1.5865; -0.1503]        7.4
## Jimbo et al.           -0.7654 [-1.4334; -0.0974]        7.6
## Judkins et al.          0.6675 [-0.0690;  1.4039]        7.4
## Hofstad et al.          1.0086 [-0.0002;  2.0174]        6.6
## Frasier et al.         -1.1447 [-1.7143; -0.5751]        7.8
## Azari et al.           -0.2982 [-1.2042;  0.6078]        6.9
## Pagador et al.          0.0585 [-1.3278;  1.4448]        5.5
## Hung et al.            -1.6706 [-2.1389; -1.2022]        8.0
## Mazomenos et al.       -1.7056 [-3.0573; -0.3540]        5.6
## Glarner et al.         -0.6245 [-1.7880;  0.5391]        6.2
## Zhenzhu et al.          2.9204 [ 0.8651;  4.9757]        3.9
## Franco-González et al.  1.0406 [-0.1276;  2.2088]        6.1
## Berges et al.          -0.7976 [-1.1148; -0.4803]        8.3
## 
## Number of studies combined: k = 16
## 
##                          SMD            95%-CI     t p-value
## Random effects model -0.2317 [-0.9313; 0.4680] -0.71  0.4912
## 
## Quantifying heterogeneity:
##  tau^2 = 0.9091 [0.5199; 6.3097]; tau = 0.9535 [0.7210; 2.5119]
##  I^2 = 80.8% [69.7%; 87.8%]; H = 2.28 [1.82; 2.86]
## 
## Test of heterogeneity:
##      Q d.f.  p-value
##  77.95   15 < 0.0001
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model

Plot forest

forest.meta(m.toolvelocity,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Tool velocity in Surgery")

#dev.print(pdf, "figures/forest_toolvelocity.pdf", width=8, height=8)

Discussion

Velocity (and related metrics like acceleration) are semi-popular method. Results seem to vary a lot, sometimes novices are faster and sometimes experts are faster. May depend on task?

Tool Acceleration

Tool acceleration measures how much the tool/tools accelerate during the task.

Studies

Load data

df.toolacc <- read_excel('data/surgical_metrics.xlsx', sheet='tool_acceleration')

Print studies

Author Year Study Journal Note
Azari et al. 2018 Can surgical performance for varying experience be measured from hand motions? Proceedings of the Human Factors and Ergonomics Society Did not report SDs for motion metrics. I estimated SD from the subjective grading fluidity of motion SD, so acceleration SD = acceleration Mean * (grade SD/grade Mean).
Frasier et al. 2016 A marker-less technique for measuring kinematics in the operating room Surgery (United States) Gives values for grand average and by different tasks. I used grand average results.
Ebina et al. 2021 Motion analysis for better understanding of psychomotor skills in laparoscopy: objective assessment-based simulation training using animal organs Surgical Endoscopy Results for needle holder (left hand), from task 3, knot tying and suturing. Results given in paper as medians and inter-quartile ranges
Pastewski et al. 2021 Analysis of Instrument Motion and the Impact of Residency Level and Concurrent Distraction on Laparoscopic Skills Journal of Surgical Education Junior and Senior residents. Did task with and without secondary task (to add distractions). Acceleration was reported for three degrees of freedom of motion (yaw, pitch, roll). Results here are for Roll and NO secondary task.
Davids et al. 2021 Automated vision-based microsurgical skill analysis in neurosurgery using deep learning: Development and preclinical validation. World Neurosurgery Values given as medians. Sd estimated from boxplot
Glarner et al. 2014 Quantifying technical skills during open operations using video-based motion analysis Surgery Effects and SDs estimated from fig 3. Effects are for non-dominant hand (ND). The task was split into four sub-tasks, resultsh ere are for suturing, C. Six patients operated on, novice and expert performed the same operation in parallel
Zhenzhu et al. 2020 Feasibility Study of the Low-Cost Motion Tracking System for Assessing Endoscope Holding Skills World Neurosurgery Values estimated from boxplots (Fig. 6), for the 0’ setup. Max acceleration
Franco-González et al. 2021 Development of a 3D Motion Tracking System for the Analysis of Skills in Microsurgery Journal of Medical Systems Values are for the suturing task

Results

Run meta-analysis

m.toolacc <- metagen(TE=g,
                 seTE=SDg,
                 studlab=Author,
                 data=df.toolacc,
                 sm="SMD",
                 fixed=FALSE,
                 random=TRUE,
                 method.tau="REML",
                 hakn=TRUE,
                 title="Tool acceleration in Surgery")
summary(m.toolacc)
## Review:     Tool acceleration in Surgery
## 
##                            SMD             95%-CI %W(random)
## Azari et al.           -0.3713 [-1.2803;  0.5377]       13.9
## Frasier et al.         -1.0298 [-1.5922; -0.4674]       16.7
## Ebina et al.           -0.7891 [-1.5016; -0.0767]       15.5
## Pastewski et al.        0.1911 [-0.4748;  0.8570]       15.9
## Davids et al.          -0.0233 [-2.0633;  2.0167]        6.6
## Glarner et al.         -0.7538 [-1.9316;  0.4240]       11.7
## Zhenzhu et al.          2.1002 [ 0.3361;  3.8643]        7.9
## Franco-González et al.  1.0900 [-0.0852;  2.2652]       11.7
## 
## Number of studies combined: k = 8
## 
##                          SMD            95%-CI     t p-value
## Random effects model -0.1119 [-0.9315; 0.7077] -0.32  0.7563
## 
## Quantifying heterogeneity:
##  tau^2 = 0.5701 [0.1020; 4.2179]; tau = 0.7550 [0.3194; 2.0538]
##  I^2 = 70.0% [37.6%; 85.6%]; H = 1.83 [1.27; 2.63]
## 
## Test of heterogeneity:
##      Q d.f. p-value
##  23.32    7  0.0015
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model

Plot forest

forest.meta(m.toolacc,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Tool acceleration in Surgery")

#dev.print(pdf, "figures/forest_toolacceleration.pdf", width=8, height=8)

Discussion

Not many papers that focused on tool accelerations. Jerk (third derivative of position, derivative of acceleration) is much more popular.

Tool Jerk

Jerk is the third derivative of the surgical instruments position, and measures how smooth the movements are.

Studies

Load data

df.jerk <- read_excel('data/surgical_metrics.xlsx', sheet='tool_jerk')

Print studies

Author Year Study Journal Note
Ghasemloonia et al. 2017 Surgical Skill Assessment Using Motion Quality and Smoothness Journal of Surgical Education Results from task C included. Task had 4 groups of participants, results are from surgeons and residents. 9 trials per participant, 4 participants per group, so n=36 for both groups
Hwang et al. 2006 Correlating motor performance with surgical error in laparoscopic cholecystectomy Surgical Endoscopy and Other Interventional Techniques NA
Ebina et al. 2021 Motion analysis for better understanding of psychomotor skills in laparoscopy: objective assessment-based simulation training using animal organs Surgical Endoscopy Results from task 3, knot tying and suturing. Results given in paper as medians and inter-quartile ranges
Azari et al. 2018 Can surgical performance for varying experience be measured from hand motions? Proceedings of the Human Factors and Ergonomics Society Reported grand average results by skill group and by skill group and task. Results included here are the grand average by skill. Had 4 skill groups, picked medical students and attending surgeons. Paper did not report SDs for motion metrics, so I used the ratio of subjective evaluations mean and sd to estimate the sd. I.e. for novice’s the subjective motion fluidity score was mean=4.1, sd=1.9, so the SD for jerk was calculated as 178.34(1.9/4.1) (mean jerk (sd of fluidity score / mean of fluidity score)
Davids et al. 2021 Automated vision-based microsurgical skill analysis in neurosurgery using deep learning: Development and preclinical validation. World Neurosurgery Values given as medians
Oropesa et al. 2013 Relevance of Motion-Related Assessment Metrics in Laparoscopic Surgery Surgical Innovation Means and SDs estimated from boxplots. Reports dominant and non-dominant hand separately, I picked non-dominant hand. Results for Coordinated pulling task.
Maithel et al 2005 Simulated laparoscopy using a head-mounted display vs traditional video monitor: An assessment of performance and muscle fatigue Surgical Endoscopy and Other Interventional Techniques NA
Liang et al. 2018 Motion control skill assessment based on kinematic analysis of robotic end-effector movements The International Journal of Medical Robotics and Computer Assisted Surgery Estimated from boxplots. Reported left/right hand separately, here the results are for left hand
Islam et al. 2016 Affordable, web-based surgical skill training and evaluation tool Journal of Biomedical Informatics Mean values estimated from boxplot. Standard deviations were not given, I used the similar-ish values as in our study (i = 0), so novice’s SD is about 1/5 of the mean, experts is 1/12. Measured jerk with “jerkiness score”
Hofstad et al. 2017 Psychomotor skills assessment by motion analysis in minimally invasive surgery on an animal organ Minimally Invasive Therapy and Allied Technologies Values estimated from boxplot, used results for US hook
Shafiel et al. 2017 Motor Skill Evaluation During Robot-Assisted Surgery Volume 5A: 41st Mechanisms and Robotics Conference NA
Chmarra et al. 2010 Objective classification of residents based on their psychomotor laparoscopic skills Surgical Endoscopy and Other Interventional Techniques Values estimated from plots, used the pipe cleaner task results.
Mazomenos et al. 2016 Catheter manipulation analysis for objective performance and technical skills assessment in transcatheter aortic valve implantation International Journal of Computer Assisted Radiology and Surgery Task was performed with conventional tools and with robotic tools. Results are for conventional tools. There were 2 stages, results here are for stage 1. SDs evaluated from boxplots (Fig. 5). Expert jerk weirdly small?
Berges et al. 2022 Eye Tracking and Motion Data Predict Endoscopic Sinus Surgery Skill Laryngoscope Participants completed 9 tasks. Results are for smoothness. Text states that units for smoothness are 1/s^4, but Table 1 says that smoothness is the derivative of acceleration (jerk). Smoothness values are the first ones from Table 2.

Results

Run meta-analysis

m.jerk <- metagen(TE=g,
                 seTE=SDg,
                 studlab=Author,
                 data=df.jerk,
                 sm="SMD",
                 fixed=FALSE,
                 random=TRUE,
                 method.tau="REML",
                 hakn=TRUE,
                 title="Jerk in Surgery")
summary(m.jerk)
## Review:     Jerk in Surgery
## 
##                         SMD             95%-CI %W(random)
## Ghasemloonia et al.  1.7090 [ 1.1677;  2.2504]        8.3
## Hwang et al.         2.6183 [ 0.1709;  5.0658]        3.9
## Ebina et al.        -0.9365 [-1.6598; -0.2133]        8.0
## Azari et al.        -0.1972 [-0.8043;  0.4098]        8.2
## Davids et al.        0.1307 [-1.9101;  2.1714]        4.7
## Oropesa et al.      -0.9775 [-2.1179;  0.1629]        6.9
## Maithel et al        1.6060 [ 0.7461;  2.4658]        7.6
## Liang et al.         0.1596 [-0.7184;  1.0377]        7.6
## Islam et al.         3.6094 [ 2.4907;  4.7280]        7.0
## Hofstad et al.       1.3201 [-0.1549;  2.7952]        6.0
## Shafiel et al.       0.4174 [ 0.2951;  0.5397]        8.8
## Chmarra et al.       0.8995 [-0.0026;  1.8015]        7.5
## Mazomenos et al.     1.0303 [-0.1862;  2.2468]        6.7
## Berges et al.        0.4687 [ 0.1586;  0.7788]        8.7
## 
## Number of studies combined: k = 14
## 
##                         SMD           95%-CI    t p-value
## Random effects model 0.7731 [0.0558; 1.4904] 2.33  0.0367
## 
## Quantifying heterogeneity:
##  tau^2 = 1.2252 [0.5180; 3.8165]; tau = 1.1069 [0.7197; 1.9536]
##  I^2 = 85.5% [77.2%; 90.8%]; H = 2.63 [2.10; 3.29]
## 
## Test of heterogeneity:
##      Q d.f.  p-value
##  89.66   13 < 0.0001
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model

Plot forest

forest.meta(m.jerk,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Jerk in Surgery")

#dev.print(pdf, "figures/forest_tooljerk.pdf", width=8, height=8)

Discussion

TBD

Tool Force

Tool force is the force the surgeon uses when they e.g. grasp something using the surgical tools.

Studies

Load data

df.force <- read_excel('data/surgical_metrics.xlsx', sheet='tool_force')

Print studies

Author Year Study Journal Note
Harada et al. 2015 Assessing Microneurosurgical Skill with Medico-Engineering Technology World Neurosurgery Results for needle extraction phase (c), estimated from boxplot. Maximum needle gripping force
Prasad et al. 2016 Objective Assessment of Laparoscopic Force and Psychomotor Skills in a Novel Virtual Reality-Based Haptic Simulator Journal of Surgical Education Results estimated from boxplot. Whole group data (subplot a) reported here.
Horeman et al. 2014 Assessment of Laparoscopic Skills Based on Force and Motion Parameters IEEE Transactions on Biomedical Engineering Results estimated from boxplot, for task 2. Max force values used.
Trejos et al. 2014 Development of force-based metrics for skills assessment in minimally invasive surgery Surgical Endoscopy Used results for max grasp force, values evaluated from Fig. 4 (a). Compared experience level 1 and 6
Woodrow et al. 2007 Training and evaluating spinal surgeons: The development of novel performance measures Spine Values estimated from Fig. 2. Values are mean forces. Compared results for lumbar level L2.
Sugiyama et al. 2018 Forces of Tool-Tissue Interaction to Assess Surgical Skill Level JAMA Surgery Evaluated values from Fig. 3 c. Standardizer, maximum force.
Araki et al. 2017 Comparison of the performance of experienced and novice surgeons: measurement of gripping force during laparoscopic surgery performed on pigs using forceps with pressure sensors Surgical Endoscopy The plot shows that novices grasped with force that is slightly over 8, but the text reports 7.15. Typo in text? SDs evaluated from boxplots. 4 novices and 4 experts, task completed twice.
Prasad et al. 2018 Face and Construct Validity of a Novel Virtual Reality–Based Bimanual Laparoscopic Force-Skills Trainer With Haptics Feedback Surgical Innovation Results are for the suturing task, non-dominant hand, mean needle force. Same dataset as in Prasad (2016)?
Amiel et al. 2020 Experienced surgeons versus novice surgery residents: Validating a novel knot tying simulator for vessel ligation Surgery 4 different knot types, each completed twice. 15 experts and 30 novices. Results are for the deep two hand knot (Fig. 2). Effects estimated from the plot, for Total Force.
Yoshida et al. 2013 Analysis of laparoscopic dissection skill by instrument tip force measurement Surgical Endoscopy Peak horizontal force results used. 10 novices and 10 experts, each performed the task 10 times
de Mathelin et al. 2019 Sensors for expert grip force profiling: Towards benchmarking manual control of a robotic device for surgical tool movements Sensors Results are for non-dominant hand, sensor 6 (S6), which was placed on the ring finger. Expert user was right-handed and novice left-handed. One expert, one novice participant. Expert results are for 12 sessions, novice results are for 10 sessions.

Results

Run meta-analysis

m.force <- metagen(TE=g,
                 seTE=SDg,
                 studlab=Author,
                 data=df.force,
                 sm="SMD",
                 fixed=FALSE,
                 random=TRUE,
                 method.tau="REML",
                 hakn=TRUE,
                 title="Force use in Surgery")
summary(m.force)
## Review:     Force use in Surgery
## 
##                        SMD             95%-CI %W(random)
## Harada et al.       0.5357 [-0.0830;  1.1545]        9.5
## Prasad et al.       1.2450 [ 0.6378;  1.8523]        9.5
## Horeman et al.      2.7082 [ 1.5555;  3.8609]        9.1
## Trejos et al.       1.5351 [ 0.2224;  2.8477]        8.9
## Woodrow et al.      3.8205 [ 2.2438;  5.3972]        8.6
## Sugiyama et al.     0.3071 [-0.8880;  1.5022]        9.0
## Araki et al.        1.4757 [ 0.3564;  2.5950]        9.1
## Prasad et al.      -3.3271 [-4.1910; -2.4633]        9.3
## Amiel et al.        1.1005 [ 0.6332;  1.5678]        9.6
## Yoshida et al.      0.3002 [ 0.0214;  0.5789]        9.6
## de Mathelin et al.  7.0083 [ 4.6980;  9.3186]        7.7
## 
## Number of studies combined: k = 11
## 
##                         SMD            95%-CI    t p-value
## Random effects model 1.4050 [-0.2424; 3.0524] 1.90  0.0866
## 
## Quantifying heterogeneity:
##  tau^2 = 5.3033 [2.4335; 19.0861]; tau = 2.3029 [1.5600; 4.3688]
##  I^2 = 93.6% [90.4%; 95.7%]; H = 3.94 [3.22; 4.83]
## 
## Test of heterogeneity:
##       Q d.f.  p-value
##  155.58   10 < 0.0001
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model

Plot forest

forest.meta(m.force,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Force use in Surgery")

#dev.print(pdf, "figures/forest_toolforce.pdf", width=8, height=8)

Discussion

Forces analyzed somewhat commonly, but often not between novices and experts, but within tasks, or tools, or skill groups.

Task-Evoked Pupil Dilation

Pupil size measures cognitive workload, stress, and million other things.

Studies

Load data

df.pupil <- read_excel('data/surgical_metrics.xlsx', sheet='pupil_dilation')

Print studies

Author Year Study Journal Note
Castner et al. 2020 Pupil diameter differentiates expertise in dental radiography visual search PLOS ONE Reported values are medians? Median change from baseline
Cabrera-Mino et al. 2019 Task-Evoked Pupillary Responses in Nursing Simulation as an Indicator of Stress and Cognitive Load Clinical Simulation in Nursing There were different tasks, picked the one that had the most significant result. Values estimated from barplot
Bednarik et al. 2018 Pupil Size As an Indicator of Visual-motor Workload and Expertise in Microsurgical Training Tasks Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications Took the segment ‘needle push’, estimated from plots
Gunawardena et al. 2019 Assessing Surgeons’ Skill Level in Laparoscopic Cholecystectomy using Eye Metrics Eye Tracking Research and Applications Symposium (ETRA) Study had only 4 participants of 3 skill levels who completed >=7 tasks each. I picked the least experienced participant and expert E-2.
Dilley et al. 2020 Visual behaviour in robotic surgery—Demonstrating the validity of the simulated environment International Journal of Medical Robotics and Computer Assisted Surgery SDs calculated from inter-quartile ranges (SD = (3/4)*IQR). The paper reports medians.
Gao et al. 2018 Quantitative evaluations of the effects of noise on mental workloads based on pupil dilation during laparoscopic surgery American Surgeon They evaluated different noise conditions, I picked values from the no-noise condition. Paper does not give explicitly the number of participants in groups, only total number (24) which was “divided into experienced and moderately experienced”. I assumed 12 per group
Erridge et al. 2018 Comparison of gaze behaviour of trainee and experienced surgeons during laparoscopic gastric bypass British Journal of Surgery Results for Segment 1, maximum pupil size

Results

Run meta-analysis

m.pupil <- metagen(TE=g,
                 seTE=SDg,
                 studlab=Author,
                 data=df.pupil,
                 sm="SMD",
                 fixed=FALSE,
                 random=TRUE,
                 method.tau="REML",
                 hakn=TRUE,
                 title="Pupil dilation in Surgery")
summary(m.pupil)
## Review:     Pupil dilation in Surgery
## 
##                         SMD             95%-CI %W(random)
## Castner et al.       0.7877 [ 0.6671;  0.9083]       15.8
## Cabrera-Mino et al.  0.8255 [ 0.0502;  1.6009]       14.8
## Bednarik et al.     -2.9791 [-3.5250; -2.4332]       15.3
## Gunawardena et al.   1.5927 [ 0.3701;  2.8152]       13.5
## Dilley et al.       -0.0152 [-0.7136;  0.6833]       15.0
## Gao et al.           1.2184 [ 0.3422;  2.0946]       14.6
## Erridge et al.       0.2061 [-1.7697;  2.1820]       11.0
## 
## Number of studies combined: k = 7
## 
##                         SMD            95%-CI    t p-value
## Random effects model 0.2042 [-1.2366; 1.6449] 0.35  0.7406
## 
## Quantifying heterogeneity:
##  tau^2 = 2.3095 [0.8388; 11.0728]; tau = 1.5197 [0.9158; 3.3276]
##  I^2 = 96.7% [95.0%; 97.8%]; H = 5.51 [4.46; 6.81]
## 
## Test of heterogeneity:
##       Q d.f.  p-value
##  182.24    6 < 0.0001
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model

Plot forest

forest.meta(m.pupil,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="Pupil dilation in Surgery")

#dev.print(pdf, "figures/forest_pupil.pdf", width=8, height=8)

Discussion

Prior research indicates that higher stress/cognitive workload -> larger pupil size. This is seen in most studies. In Bednarik et al. (2018), the effect is reversed. For that study, I picked needle piercing segment (because it was quaranteed to have un-interrupted visual contact from the participant). It can be that experts focused more on this, and had larger cognitive workload and pupil dilations.

Not that many studies that have measured pupil dilations and compared surgical novices and experts directly. Some used measures like ICA or Entropy (not included here). Pupil dilations used in other fields more often.

SEI: OSATS

OSATS is a evaluation instrument that consists of a grading scale and a checklist.

Studies

Load data

df.osats <- read_excel('data/surgical_metrics.xlsx', sheet='scale_OSATS')

Print studies

Author Year Study Journal Note
Nickel et al. 2016 Direct Observation versus Endoscopic Video Recording-Based Rating with the Objective Structured Assessment of Technical Skills for Training of Laparoscopic Cholecystectomy European Surgical Research OSATS score from Table 1, direct observation, novices and experts compared
Paley et al. 2021 Crowdsourced Assessment of Surgical Skill Proficiency in Cataract Surgery Journal of Surgical Education Used modified OSATS. SD estimated from Figure 1F. Used expert ratings.
Kassab et al. 2011 “Blowing up the barriers” in surgical training: Exploring and validating the concept of distributed simulation Annals of Surgery Study had two tasks, results are for DS (distributed simulation) because these results were given in the text (box trainer results only as figure). Note that DS was novel task developed for this study.
Black et al. 2010 Assessment of surgical competence at carotid endarterectomy under local anaesthesia in a simulated operating theatre British Journal of Surgery Results for crisis scenario
Willems et al. 2009 Assessing Endovascular Skills using the Simulator for Testing and Rating Endovascular Skills (STRESS) Machine European Journal of Vascular and Endovascular Surgery Combination of OSATS and some other score? May not be suitable for comparison here. Remove in the future. SDs estimated from Figure 2.
Leong et al. 2008 Validation of orthopaedic bench models for trauma surgery Journal of Bone and Joint Surgery - Series B Used results for DCP, dynamic comperssion plate. Esimtaed values from boxplot.
Hance et al. 2005 Objective assessment of technical skills in cardiac surgery European Journal of Cardio-thoracic Surgery Paper reported several tasks, live and blinded scoring. Values here are for LAD anastomosis, blinded scoring.
Zevin et al. 2013 Development, feasibility, validity, and reliability of a scale for objective assessment of operative performance in laparoscopic gastric bypass surgery Journal of the American College of Surgeons Results are for Jejunojejunostomy
Hopmans et al. 2014 Assessment of surgery residents’ operative skills in the operating theater using a modified Objective Structured Assessment of Technical Skills (OSATS): A prospective multicenter study Surgery (United States) Study included various tasks and techniques, results are for laparoscopic cholecystectomy. Novices are PGY1-2 and experts PGY5-6

Results

Run meta-analysis

m.osats <- metagen(TE=g,
                 seTE=SDg,
                 studlab=Author,
                 data=df.osats,
                 sm="SMD",
                 fixed=FALSE,
                 random=TRUE,
                 method.tau="REML",
                 hakn=TRUE,
                 title="OSATS in Surgery")
summary(m.osats)
## Review:     OSATS in Surgery
## 
##                    SMD             95%-CI %W(random)
## Nickel et al.  -1.7336 [-2.8266; -0.6406]       12.1
## Paley et al.   -3.8215 [-5.6042; -2.0388]        8.5
## Kassab et al.  -2.1749 [-3.2989; -1.0508]       11.9
## Black et al.   -5.2357 [-7.1431; -3.3283]        7.9
## Willems et al. -3.7687 [-5.6853; -1.8520]        7.9
## Leong et al.   -2.7078 [-4.2138; -1.2017]        9.8
## Hance et al.   -1.0602 [-1.9016; -0.2188]       13.5
## Zevin et al.   -1.8927 [-2.5553; -1.2301]       14.4
## Hopmans et al. -1.8633 [-2.5886; -1.1380]       14.1
## 
## Number of studies combined: k = 9
## 
##                          SMD             95%-CI     t p-value
## Random effects model -2.4486 [-3.3937; -1.5034] -5.97  0.0003
## 
## Quantifying heterogeneity:
##  tau^2 = 0.9152 [0.1979; 5.8046]; tau = 0.9567 [0.4449; 2.4093]
##  I^2 = 67.3% [34.1%; 83.8%]; H = 1.75 [1.23; 2.48]
## 
## Test of heterogeneity:
##      Q d.f. p-value
##  24.48    8  0.0019
## 
## Details on meta-analytical method:
## - Inverse variance method
## - Restricted maximum-likelihood estimator for tau^2
## - Q-profile method for confidence interval of tau^2 and tau
## - Hartung-Knapp adjustment for random effects model

Plot forest

forest.meta(m.osats,sortvar=g, prediction=TRUE, prin.tau2=TRUE, title="OSATS in Surgery")

#dev.print(pdf, "figures/forest_OSATS.pdf", width=8, height=8)

Discussion

Not many papers that focused on tool accelerations. Jerk (third derivative of position, derivative of acceleration) is much more popular.